Articles Joshua 6 : A phrase - based and hierarchical statistical machine translation system

نویسندگان

  • Martin Popel
  • Ondřej Bojar
  • Jaroslav Peregrin
  • Patrice Pognan
  • Matt Post
  • Yuan Cao
  • Gaurav Kumar
  • Miloš Stanojević
  • Khalil Sima’an
  • Ulrich Germann
  • Eleftherios Avramidis
  • Aljoscha Burchardt
  • Duc Tam Hoang
چکیده

We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools. Joshua 6 also includes a number of large-scale discriminative tuners and a simplified sparse feature function interface with reflection-based loading, which allows new features to be used by writing a single function. Finally, Joshua includes a number of simplifications and improvements focused on usability for both researchers and end-users, including the release of language packs— precompiled models that can be run as black boxes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Non-Hierarchical Phrase-Based Translation

A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend phrase-based decoding to allow both source and target phrasal discontinuities, which provide better generalization on unseen data and yield significant improvements to a standard phrase-based system (Moses). More inte...

متن کامل

Joshua 6: A phrase-based and hierarchical statistical machine translation system

We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools...

متن کامل

LIUM's statistical machine translation systems for IWSLT 2009

This paper describes the systems developed by the LIUM laboratory for the 2009 IWSLT evaluation. We participated in the Arabic and Chinese/English BTEC tasks. We developed three different systems: a statistical phrase-based system using the Moses toolkit, an Statistical Post-Editing (SPE) system and a hierarchical phrase-based system based on Joshua. A continuous space language model was deploy...

متن کامل

Automated Grammar Correction Using Hierarchical Phrase-Based Statistical Machine Translation

We introduce a novel technique that uses hierarchical phrase-based statistical machine translation (SMT) for grammar correction. SMT systems provide a uniform platform for any sequence transformation task. Thus grammar correction can be considered a translation problem from incorrect text to correct text. Over the years, grammar correction data in the electronic form (i.e., parallel corpora of ...

متن کامل

Hierarchical Phrase-Based Statistical Machine Translation System

The aim of this thesis is to express fundamentals and concepts behind one of the emerging techniques in statistical machine translation (SMT) hierarchical phrase based MT by implementing translation from Hindi to English. Basically hierarchical model extends phrase based models by considering subphrases with the aid of context free grammar (CFG). In other models, syntax based models bear a rese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015